Biomolecular network motif counting and discovery by color coding
نویسندگان
چکیده
Protein-protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks. Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k < or = 7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k > or = 8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others. In this article, we show how to apply the 'color coding' technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G' with k vertices in a network G with n vertices in time polynomial with n, provided k = O(log n). We use our algorithm to obtain 'treelet' distributions for k < or = 10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the 'duplication model' but are quite different from that of the 'preferential attachment model'. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70% but differences can be observed when bait/edge coverage drops to 50%.
منابع مشابه
Approximating the Number of Network Motifs
World Wide Web, the Internet, coupled biological and chemical systems, neural networks, and social interacting species, are only a few examples of systems composed by a large number of highly interconnected dynamical units. These networks contain characteristic patterns, termed network motifs, which occur far more often than in randomized networks with the same degree sequence. Several algorith...
متن کاملDevelopment of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کاملA profile-based deterministic sequential Monte Carlo algorithm for motif discovery
MOTIVATION Conserved motifs often represent biological significance, providing insight on biological aspects such as gene transcription regulation, biomolecular secondary structure, presence of non-coding RNAs and evolution history. With the increasing number of sequenced genomic data, faster and more accurate tools are needed to automate the process of motif discovery. RESULTS We propose a d...
متن کاملA pr 2 01 2 BAYESIAN CENTROID ESTIMATION FOR MOTIF DISCOVERY
Biological sequences may contain patterns that are signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each seq...
متن کاملComparative Analysis of Regulatory Motif Discovery Tools for Transcription Factor Binding Sites
In the post-genomic era, identification of specific regulatory motifs or transcription factor binding sites (TFBSs) in non-coding DNA sequences, which is essential to elucidate transcriptional regulatory networks, has emerged as an obstacle that frustrates many researchers. Consequently, numerous motif discovery tools and correlated databases have been applied to solving this problem. However, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 24 شماره
صفحات -
تاریخ انتشار 2008